719 research outputs found
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
We introduce ScanComplete, a novel data-driven approach for taking an
incomplete 3D scan of a scene as input and predicting a complete 3D model along
with per-voxel semantic labels. The key contribution of our method is its
ability to handle large scenes with varying spatial extent, managing the cubic
growth in data size as scene size increases. To this end, we devise a
fully-convolutional generative 3D CNN model whose filter kernels are invariant
to the overall scene size. The model can be trained on scene subvolumes but
deployed on arbitrarily large scenes at test time. In addition, we propose a
coarse-to-fine inference strategy in order to produce high-resolution output
while also leveraging large input context sizes. In an extensive series of
experiments, we carefully evaluate different model design choices, considering
both deterministic and probabilistic models for completion and semantic
inference. Our results show that we outperform other methods not only in the
size of the environments handled and processing efficiency, but also with
regard to completion quality and semantic segmentation performance by a
significant margin.Comment: Video: https://youtu.be/5s5s8iH0NF
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
We present ROCA, a novel end-to-end approach that retrieves and aligns 3D CAD
models from a shape database to a single input image. This enables 3D
perception of an observed scene from a 2D RGB observation, characterized as a
lightweight, compact, clean CAD representation. Core to our approach is our
differentiable alignment optimization based on dense 2D-3D object
correspondences and Procrustes alignment. ROCA can thus provide a robust CAD
alignment while simultaneously informing CAD retrieval by leveraging the 2D-3D
correspondences to learn geometrically similar CAD models. Experiments on
challenging, real-world imagery from ScanNet show that ROCA significantly
improves on state of the art, from 9.5% to 17.6% in retrieval-aware CAD
alignment accuracy
Language-Grounded Indoor 3D Semantic Segmentation in the Wild
Recent advances in 3D semantic segmentation with deep neural networks have
shown remarkable success, with rapid performance increase on available
datasets. However, current 3D semantic segmentation benchmarks contain only a
small number of categories -- less than 30 for ScanNet and SemanticKITTI, for
instance, which are not enough to reflect the diversity of real environments
(e.g., semantic image understanding covers hundreds to thousands of classes).
Thus, we propose to study a larger vocabulary for 3D semantic segmentation with
a new extended benchmark on ScanNet data with 200 class categories, an order of
magnitude more than previously studied. This large number of class categories
also induces a large natural class imbalance, both of which are challenging for
existing 3D semantic segmentation methods. To learn more robust 3D features in
this context, we propose a language-driven pre-training method to encourage
learned 3D features that might have limited training examples to lie close to
their pre-trained text embeddings. Extensive experiments show that our approach
consistently outperforms state-of-the-art 3D pre-training for 3D semantic
segmentation on our proposed benchmark (+9% relative mIoU), including
limited-data scenarios with +25% relative mIoU using only 5% annotations.Comment: 23 pages, 8 figures, project page:
https://rozdavid.github.io/scannet20
- …